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Abstract 



It has often been argued that all techniques of standard setting are arbitrary, 

and -likely to .yield different results for different techniques or persons. 

, *- . * * 

This paper deajs with ^related .but hi thert6 agnore'd* aspect , namely the ' 

/fc * ' V J 

possibility tha* An§off ,,or Nedelsky Juckjes specify. ihCans is teftt probabilities, 

:e;g^;,a-J,pw -pr^ob^bi 1-i ty for an„easy it^m-Jjut a large pfbbabi.11 ty, for a h^rd 

vttem. ;A' lat^'fit. tratt wexhod is proposed ^ to 'estimate such /misspeeific&tions • 

arfd tS-iteternfi'ne* whether .the judge has worked 'consistently-. Results from an 

empirical study are cpven which indicate that serious;, errors of specification* 

can bfe expected, *anc( that these are considerably larger for the Nedelsfcy than 

. ^ * z ^ * • , ■ * » 

for the Angoff technique, - • 



V 
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• Assessing^ Inconsistencies in Standard-Setting. 
* with the Angoff or Nedelsky Technique - 

* . • ] .* * 

This paper is concerned with "the 'use of , standard-setting techniques in* 

• •■/.' ... - - \ : * • ' \ — 

objectiyss-l>ase*d instructional -progVanis.Tor si^h programs, a ^jreat variety of 

techniques has been' proposed (for reviews, see^&Vass'', ,1978; Hamtleton, 1980; * 
Hamfcleton; Powell , .&„ Eignor/ 1979; • Jaeger, K 1979; ^hepard, 1980a, 1980b). The 
emphasis io this paper will be on thfe Angoff. (197}) and Nedelsky (1954) tech- 
niques. These two techniques, which are based bn, an item by item judgment of 

»• » * * 

test content, are among' the most popular techniques in use in objectives- 
based instruction. * * 4 

It has-been argued that all standard Setting is arbitrary (Glass, 1978; 
Shepard," 1979, 1980a, 1980n).,This is correct since standards 'ought to reflect 
learning objectives, and these, yltimately rest on values'and norms. In 
addition, the various standard-setting techniques available differ, more or " 

» K * , 

less, in the conceptfon oY. .mastery underlying the ,way 'standards are obtained.' 
Therefore, different results can' be expected both, for different techniques 
and for different persons using- the same technique. Jhjs*has been ,conf frmed * 
in many experiments .{Andrew & Hepht, 1976; Jirennan & Lockwood, 1980; . 
Koffler, 1980; Saunders, Ryan, & Huynh, 1981; Skakun & Kling, 1980). In this 
paper we 'do not share the concern with inconsistent results due to differen- * 
ces between techniques or, persons. Instead, the interest is in $ related" 
but hitherto ignored apsect of standard setting, namejy.the possibility of 

IS t ra L J ., uc *-^ e i Inc ons i $ ten cy , Inti*aj*udge inconsistency prises when the* judge- 

•* ■ » * 

specifies probabilities of success on the item? .that aref incompatible with - 



Inconsistencies in Standard Setting' 
3. . 



each other and imply different, conflicting standards. An example of 
intrajiidge inconsistency is a judge specifying a low pVobabili ty, of success 

♦for ari easy item but a large probability for a hard item. These two 

» * » 

j'dugments are obviously inconsistent: the former implies a low standard, 

whereas the letter indicates that a high standard should be set. Another 
example is*a Judge specifying approximately £qual probabilities -for highly 
discriminating items (of d-iff,ering^ difficulties). Generally, inconsistencies 
such as* in these examples are due to a discrepancy between the actual proper- 
ties of the items and the judged perception of them. * 1 ^ 

Thus far\ no attention has .been paid to the.possibility of i'ntrajudge 
inconsistency, and results' of the Angoff or Nedel sky technique are generally 
employed without checking the quality of the*jydge. This may be due to the . 
fact that classical test theory does not provide satisfactory methods for 
Analyzing such -inconsistencies. It is the "purpose of this paper to show how 
latent trait theory can be used to decide whether Angoff of Nedelsky standards 

* have been set consistenly enough for. use in practice and to assess for which p 
items inconsistencies have occurred. The second purpose of the paper is to 

f 

present empirical results showing how consistently the Angoff and Nedelsky , 
techniques were used in a typical educational situation. In, the following 
it is assumed that the reader is familiar with the elementary concepts 
from latent trait" theory as-well as the technical aspects of th.e'Angoff and 
Nedelsky techniques (see the appendix). A fuller description of the method 

• and the empirical results is given in*van der Lin?fen (1981a). 
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Method 



T 

As mentioned earlier, intrajudge inconsistencies arise when probabili- 
ties are specified that are incompatible with, each other and imply different, 
conflicting standards/ Figure 1 shows how this can be viewed from latent 
trait theory. In this example, 0 C denotes the level of mastery belonging to 
the borderline* student whom the judge has in mind. From 'the item characterise 
tie curve it follows that this borderl-fne student has a probability of 
Success equal to "p.: However, the judge- specifies a probability equal to 
p.^ s )\ Now, a misspecification occurs if 



e. -- p. (s) '- p. 



r 

'is unequal to zero.. It 'is easy to see that a judge'is only'consi stent if «o 
misspecifications occur, that is, if e. =0 for all items. ;As soon as 
mi sspecnfi cations are obtained for some items, the Judge is inconsistent" 
in the sense thai his probabilities •do"'not impjy the same mastery level and 
therefore cannot belong to one- person 1 . Thus,^in order to decide whether a ' 
judge' has worked consistently-, a method is needed t'o assess the mrisspecifica- 

* tions e. . / t ,/<*. < . 

The 'following 's/teps summarize how latent trait theory can' be used for 

.tWs.pjjrpo^; 

'l. A latent .tryi^t; model i ; s chosen, '-its parameters are Estimated, and its 

'■// - • ■ - ■ . t. ''< ' « ■ • 

fit is tested. Suppose that n items fit the model. ■ 1 I 

- if • t • > / V * 
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For these n items the Angoff or Nedelsky technique is used to specify 

(s) 

for each item the probability of success p.? /\ 

f • * 

Using equations 1 and 2- (appendix) , the Angoff or Nedelsky standard, t f 

is competed. * % ' ; ; • 

Jhe hypothesis to be tested is that the judge has worjced consistently, 

i.e., has specified correct probabilities of success.. Note that under this 

hypothesis, the Angoff or Nedelsky standard technically is a true score 

(expected observed score). The true score standard t c is next transformed 

into a standard on the 0-scale of the latent trait model via the estimated 

f * 

test characteristic curve (appendix, equation 3). Since the latent trait 

standard 6 is no explicit function of t , trial values must be sub- 
c r * c 

stituted for the former until the value of the flatter is obtained. The 
task is simplified by the fact that 6 is monotonically related to t. . 
However, some computer* programs standardly produce the estimated test 
characteristic curve, and in that case 0 C can simply be read off. 
Next, substituting . 6 and the estimated item parameters into the model, 

r 

the estimated probabilities p^ are computed, 

.In order to determine whether the hypothesis of a consistent Judge is 

tenable, a comparison between the subjective probabilities, provided by 

i ***** • * 

the judge, P^ S ^> and the objective probabilities estimated under the 
model, p., must be made. This can be done using the index of consistency 
£j (appendix, equation 5). is the degree to which the average absolute 
misspecification differs from its maximum possible value, measured on the 
standard interval [0, 1]. The closer the value of C, to zero,* the Jess 
tenable „the hypothesis . is that the judge has worked consistently. 
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7. A special difficulty is associated with the use of the Nedelsky technique. 

> * ^ 

This technique can provide only a 1 limited number of possible probabilities 
of supcess, and inconsistencies fliay therefore be attributable to the 
discrete- character&'.pf the technique rather than the judge's behavior. The 
index A (appendix, equation 6-) can be used to assess how.large a reduction 
-of consistency has occurred because of the discrete character of the / 
technique. - 1 

8. Finally, the pattern of differences between p^ s ) and p., is analyzed. 
Technically, these differences are the "residuals" left .over after the 
hypothesis of a consistent judge has been fitted to the data. An analysis 
of i this pattern can be used, for 'instance, to detect items with systematic 
errors across judgesC or items for which the judge needs additional training 



' Results 

- \ 



'An -empirical investigation 'was carried out to .illustrate the above 
method and to compare result's for the Nedelsky and Angoff techniques. Eight 
Angoff and nine Nedelsky judges were used. who each inspected the same 25- 
item test belonging to an instructional unit from a physics course introducing 
grade ten pupils to elementary mechanical concepts. Al 1 items' were~of the, 
three- or four-choice type. A latent trait analysis, based on the responses 
of 156 pupils, produced 18 items showing a satisfactory-fit to the Ras*ch 
model (appendix, equation 4). A more detailed description of the'items and '. 
the. design of the study is given in van der linden, (1981a, 1981b). 
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Table 1 ^ 



*Res.ulis for Nine Judges Using the Nedelsky Technique - * 
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] \ ■ 
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Estimated Provabilities of Sudcess ior« 
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Two Nedelsky Judges . 




















1 






Judge 2 






Judge 5 


V 


* / ' v 4 
.0 / » 


"Item 


m 


Pi 

« 


i 


\ 


p. <s) 


K i i 




4 /• * 




\ 














j + * 


». i 


cn 

. bO 


' .73 


" '.73 ' 


.08 ; 


• 33 


v .66 . .66 


.01 , 


/ > 


c 


1" nn 
1.00 


.11 


t .89 


.12 


- .33 


.'08 .92 


.08 - • • : 






1 nn 
1.00 


.93 


.93 


.07 


1.00 


.90 .90 


.10 


/ 1 


4 


.50 


.50 


.50 . 


.16 


• .50. 


' .41 ,59 


t .04 * . : 


V 


c 
D 

• 


i nn • 
1.00 


. .94 


0 

.94. :< 


.05 


1.00 


•!92 .92 ■ 


.08 


0 


\ 


* * 
cn 

.50 


.84 


. .84 


.15 


.50 


.79 .79. 


12 • * • 




7 


» 1 nn 

1.00 


.87 


.87- 


.12 


.50 


.83 4 .83 


• 16 

H > 


* 


Q 

o 

* 


cn' 
. b0 


.?2 


'.92 " 


.07, 


1.00 


r 

.89 .89 


/I 

.11 * * 




jy - • 


*cn 


.71 


.71 


.05 

* * 


.33 


.63 .63 


.04 


i 




cn 

. b0 


.86 


.86 • 


-.13. 


.50 


.81 .81 


.14 


.It* 


1 1 


cn 

.b0 


.74" 


-.74. 


.01 


.50 


.67 .67 


.08 . ' y 


1 " » 


12 


.-50 


.16 


.84 - 


.08 


.50' 


. ,12 ' .88- 


.12 




.13 


. .33 


.82 


.82 


.17 


> 

1 .00 


.'76 .^76 


.01 




14 


1.00 


.22 


.78 


.01 


.33' 


.17 .83 


.08 




15 , 


..50 


.26 




.02 


.3?3 

\ 


.20 .80 


.05 




16 


:25 


.62 


.62 


..12 


, t .50 


.53 ;53 


.03 ' . 




17 : 


1.00 


.94 


':.94 


.06 


1.00 


'.91 .91 


.09 1 




18 


.25 


.17 ' 


.83 


.07 


.25 


v 13 .87- 


• J2 . • •■■■/.'.■; 


!- 

: LBJL 




... / ' , 












J 

: — ^ — ^ ' - ; 



Inconsistencies in-Standard Setting 

/ ' 10. . • • ; 



- V 



Table 3 * - 
Results for Eight Judges Using the Angoff Technique 
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Table 4* 

..Estimatectr.%obabilities of Success for Two Angoff. Judges . 



Judge 2 * , Judge 7 . \ 
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\ t Table 1 .shows the results for the nine Nedelsky judges. Thf first columr 
gives the average absolute errors Of specification (E)V the ne& columns 
sh Q W the values for, the consistency index. (Cj) and the reduction in consis- ' 
tency due-to the discrete character of the Nedelsky technique (0). The mean, 
error of specification for all nine judges was no less tb4 .25. The mean 
value of 0 was equaj to .09. 
. Table 2 contains the values of p^ 5 ) and p\, for /he least consistent 
. as well as the, most consisten| judge.' The last two columns show the upper ' 
(e (u) ) and lower (e^) bounds to the estimated mi£ specification (p, (s) - J.) 
for the individual 'items .'%)te the larger variability of these specification 
errors for the worst* judge. 

The results 'for the eight Angoff judges/are given irTTable 3. The mean 
absolute error for all eight judges was equaljto..l8 and" thus less serious 
than for the Nedelsky technique. Correspondingly , the values for are higher 
than-the'ones in Table 1. Table 4 gives more detailed information about the 
resu"lts:for the least consistent and most consistent Angoff judges. 

The conclusion from the' above findings is that when using the Angoff or 
^Nedelsky technique one has to reckon with serious misspecifications of the 
probabilities^ success from which the standards are computed; n n the whole^ 



these errors are however noticeably less unfavorable for the Angoff than for ' 
the Nedelsky technique, the explanation being the £act tnat the latter adm ' its 
only discrete probabilities and thusalways forces the* judge to be inconsistent 
to some extent. " . - 
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Discussion 



The method prppdsed in this paper can -be used for several purposes. 
An obvious possibility is a routine check of standard setting results before 
they are used' in educational practice. Other possibilities 'are, *for example: 
JJ.) selecting judges meeting predetermined criteria of .consistency, (2) 
evaluating programs for training judges, (3) assessing consequences of 
modifying standard-setting techniques, or (4) item analysis to detect items 
with systematic errors across judges or techniques. 4 . ' ' 

For all these applications of the metfiod, it is necessary that ,the items 
fit the latent trait model. However, if some of the items do not satifactorily 
fit the model, the method can still be used for the other itemsMn the te£t. 
The only modification necessary js the computation of a new standard skipping 
*tjje items not fitting thfe model.' Trfe estimation, of the errors of specification 
and the -consistency index are based on* the new standard, and these estimates, 
then, still give an impression of how consistently the judge has- worked. 
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; Appendix: Main Formulas and E^udtians 



Angoff standard ^ . 

For a test of length n, the Angoff standard is equal, to 



ft 



(i) ■ ?v»>, 

1 = 1 



where p-^ is: the borderline student's probability of success as specified 
by the judge. ; 



Nedelsjky standard 



The Nedelsky standard is equal to (4). with 



where q. is the number of alternatives of item i» and is the' number of 

. alternatives^ which the judge indicates that the borderline student knows 

th§y are incorrect. *- ■ • 

» 

Test characte ristic curve 4 
For a student .with 6., it holds that - > 

• • r • ' ; • • ' - . ^ 

■ C3) X •/ FPi( + |e G ) s 5' i© c ) = E ( I "il 0 e ) = E{X|8 ) s*r c , | 

• - ■- ' * ' , . * 

-. ' ' V3:i •" ''-'-I 
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where' P. (+|u ) v is the probability of a correct response to item i; E( ) is 

* 1 C v 

the expegted value operator, u. is the item response variable -(1 = correct, 
0 = incorrect), and i is the classical test^theory true score. 



Rasch model 



.(4)\ P i (+|e c ) = {1 + exp[-(6 c - b^J}" 1 , . 
where b. 'is the difficulty parameter of item i. 



Index of consistence 



v. 

(5) " .Cf -1^ 
M 

# 

where 



- I |Pi (s ! - PiJ/n; ' 
i=l' 1 •* 1J * 



M i [ e>Vn; 
i=l 1 



e.^H max{p., 1 - p.}, 



Note that e/^ is the maximum value "of IpiWl - I* antl that i M is the 
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maximum value^of E, 
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Reduction in consistency V . * • 

' For the'.Nedefsky technique the. reduetioft£in consistency due to the discrete 
\ %m . . . 
.character of its probabilities is equal to" * ... 
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: (6)- X = C 2 - C v . 
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e (£ )/n; 
i = l 1 
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•amj k t * is the value of k.'in (2). chosen such that is minimal.. Note 
' that e^*^. is the minimum value of |p^ s ^ - P-^ > a " d that m is the minimum 
value of E, . * M 
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